Considere a base de dados "Ames Housing Dataset". Com o objetivo de desenvolver um modelo de predição do preço do imóvel, desenvolva os itens a seguir e entregue a análise em arquivo do tipo powerpoint ou pdf
import pandas as pd
import numpy as np
import statsmodels
import seaborn
from matplotlib import pyplot as plt
pd.options.display.max_columns = 100
df = pd.read_csv('base_1ah.csv')
print(df.shape)
(1460, 81)
df.head()
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
| 1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
| 2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
| 3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
df = df.set_index('Id')
1.1 Estatísticas descritivas: frequência, proporção, média (𝑥¯), desvio padrão (𝑠), quartis (𝑄1, 𝑥̃ , 𝑄3) (1,0)
Avaliando o tipo das variáveis na base_1ah
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 1 to 1460
Data columns (total 80 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSSubClass 1460 non-null int64
1 MSZoning 1460 non-null object
2 LotFrontage 1201 non-null float64
3 LotArea 1460 non-null int64
4 Street 1460 non-null object
5 Alley 91 non-null object
6 LotShape 1460 non-null object
7 LandContour 1460 non-null object
8 Utilities 1460 non-null object
9 LotConfig 1460 non-null object
10 LandSlope 1460 non-null object
11 Neighborhood 1460 non-null object
12 Condition1 1460 non-null object
13 Condition2 1460 non-null object
14 BldgType 1460 non-null object
15 HouseStyle 1460 non-null object
16 OverallQual 1460 non-null int64
17 OverallCond 1460 non-null int64
18 YearBuilt 1460 non-null int64
19 YearRemodAdd 1460 non-null int64
20 RoofStyle 1460 non-null object
21 RoofMatl 1460 non-null object
22 Exterior1st 1460 non-null object
23 Exterior2nd 1460 non-null object
24 MasVnrType 1452 non-null object
25 MasVnrArea 1452 non-null float64
26 ExterQual 1460 non-null object
27 ExterCond 1460 non-null object
28 Foundation 1460 non-null object
29 BsmtQual 1423 non-null object
30 BsmtCond 1423 non-null object
31 BsmtExposure 1422 non-null object
32 BsmtFinType1 1423 non-null object
33 BsmtFinSF1 1460 non-null int64
34 BsmtFinType2 1422 non-null object
35 BsmtFinSF2 1460 non-null int64
36 BsmtUnfSF 1460 non-null int64
37 TotalBsmtSF 1460 non-null int64
38 Heating 1460 non-null object
39 HeatingQC 1460 non-null object
40 CentralAir 1460 non-null object
41 Electrical 1459 non-null object
42 1stFlrSF 1460 non-null int64
43 2ndFlrSF 1460 non-null int64
44 LowQualFinSF 1460 non-null int64
45 GrLivArea 1460 non-null int64
46 BsmtFullBath 1460 non-null int64
47 BsmtHalfBath 1460 non-null int64
48 FullBath 1460 non-null int64
49 HalfBath 1460 non-null int64
50 BedroomAbvGr 1460 non-null int64
51 KitchenAbvGr 1460 non-null int64
52 KitchenQual 1460 non-null object
53 TotRmsAbvGrd 1460 non-null int64
54 Functional 1460 non-null object
55 Fireplaces 1460 non-null int64
56 FireplaceQu 770 non-null object
57 GarageType 1379 non-null object
58 GarageYrBlt 1379 non-null float64
59 GarageFinish 1379 non-null object
60 GarageCars 1460 non-null int64
61 GarageArea 1460 non-null int64
62 GarageQual 1379 non-null object
63 GarageCond 1379 non-null object
64 PavedDrive 1460 non-null object
65 WoodDeckSF 1460 non-null int64
66 OpenPorchSF 1460 non-null int64
67 EnclosedPorch 1460 non-null int64
68 3SsnPorch 1460 non-null int64
69 ScreenPorch 1460 non-null int64
70 PoolArea 1460 non-null int64
71 PoolQC 7 non-null object
72 Fence 281 non-null object
73 MiscFeature 54 non-null object
74 MiscVal 1460 non-null int64
75 MoSold 1460 non-null int64
76 YrSold 1460 non-null int64
77 SaleType 1460 non-null object
78 SaleCondition 1460 non-null object
79 SalePrice 1460 non-null int64
dtypes: float64(3), int64(34), object(43)
memory usage: 923.9+ KB
Análise descritiva para variáveis numéricas
df.describe()
| MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1460.000000 | 1201.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1452.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1379.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 |
| mean | 56.897260 | 70.049958 | 10516.828082 | 6.099315 | 5.575342 | 1971.267808 | 1984.865753 | 103.685262 | 443.639726 | 46.549315 | 567.240411 | 1057.429452 | 1162.626712 | 346.992466 | 5.844521 | 1515.463699 | 0.425342 | 0.057534 | 1.565068 | 0.382877 | 2.866438 | 1.046575 | 6.517808 | 0.613014 | 1978.506164 | 1.767123 | 472.980137 | 94.244521 | 46.660274 | 21.954110 | 3.409589 | 15.060959 | 2.758904 | 43.489041 | 6.321918 | 2007.815753 | 180921.195890 |
| std | 42.300571 | 24.284752 | 9981.264932 | 1.382997 | 1.112799 | 30.202904 | 20.645407 | 181.066207 | 456.098091 | 161.319273 | 441.866955 | 438.705324 | 386.587738 | 436.528436 | 48.623081 | 525.480383 | 0.518911 | 0.238753 | 0.550916 | 0.502885 | 0.815778 | 0.220338 | 1.625393 | 0.644666 | 24.689725 | 0.747315 | 213.804841 | 125.338794 | 66.256028 | 61.119149 | 29.317331 | 55.757415 | 40.177307 | 496.123024 | 2.703626 | 1.328095 | 79442.502883 |
| min | 20.000000 | 21.000000 | 1300.000000 | 1.000000 | 1.000000 | 1872.000000 | 1950.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 1900.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 2006.000000 | 34900.000000 |
| 25% | 20.000000 | 59.000000 | 7553.500000 | 5.000000 | 5.000000 | 1954.000000 | 1967.000000 | 0.000000 | 0.000000 | 0.000000 | 223.000000 | 795.750000 | 882.000000 | 0.000000 | 0.000000 | 1129.500000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 5.000000 | 0.000000 | 1961.000000 | 1.000000 | 334.500000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.000000 | 2007.000000 | 129975.000000 |
| 50% | 50.000000 | 69.000000 | 9478.500000 | 6.000000 | 5.000000 | 1973.000000 | 1994.000000 | 0.000000 | 383.500000 | 0.000000 | 477.500000 | 991.500000 | 1087.000000 | 0.000000 | 0.000000 | 1464.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 3.000000 | 1.000000 | 6.000000 | 1.000000 | 1980.000000 | 2.000000 | 480.000000 | 0.000000 | 25.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.000000 | 2008.000000 | 163000.000000 |
| 75% | 70.000000 | 80.000000 | 11601.500000 | 7.000000 | 6.000000 | 2000.000000 | 2004.000000 | 166.000000 | 712.250000 | 0.000000 | 808.000000 | 1298.250000 | 1391.250000 | 728.000000 | 0.000000 | 1776.750000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 | 7.000000 | 1.000000 | 2002.000000 | 2.000000 | 576.000000 | 168.000000 | 68.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 8.000000 | 2009.000000 | 214000.000000 |
| max | 190.000000 | 313.000000 | 215245.000000 | 10.000000 | 9.000000 | 2010.000000 | 2010.000000 | 1600.000000 | 5644.000000 | 1474.000000 | 2336.000000 | 6110.000000 | 4692.000000 | 2065.000000 | 572.000000 | 5642.000000 | 3.000000 | 2.000000 | 3.000000 | 2.000000 | 8.000000 | 3.000000 | 14.000000 | 3.000000 | 2010.000000 | 4.000000 | 1418.000000 | 857.000000 | 547.000000 | 552.000000 | 508.000000 | 480.000000 | 738.000000 | 15500.000000 | 12.000000 | 2010.000000 | 755000.000000 |
Análise descritiva das variáveis categóricas
df.describe(include=object)
| MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1460 | 1460 | 91 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1452 | 1460 | 1460 | 1460 | 1423 | 1423 | 1422 | 1423 | 1422 | 1460 | 1460 | 1460 | 1459 | 1460 | 1460 | 770 | 1379 | 1379 | 1379 | 1379 | 1460 | 7 | 281 | 54 | 1460 | 1460 |
| unique | 5 | 2 | 2 | 4 | 4 | 2 | 5 | 3 | 25 | 9 | 8 | 5 | 8 | 6 | 8 | 15 | 16 | 4 | 4 | 5 | 6 | 4 | 4 | 4 | 6 | 6 | 6 | 5 | 2 | 5 | 4 | 7 | 5 | 6 | 3 | 5 | 5 | 3 | 3 | 4 | 4 | 9 | 6 |
| top | RL | Pave | Grvl | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | PConc | TA | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | TA | Typ | Gd | Attchd | Unf | TA | TA | Y | Gd | MnPrv | Shed | WD | Normal |
| freq | 1151 | 1454 | 50 | 925 | 1311 | 1459 | 1052 | 1382 | 225 | 1260 | 1445 | 1220 | 726 | 1141 | 1434 | 515 | 504 | 864 | 906 | 1282 | 647 | 649 | 1311 | 953 | 430 | 1256 | 1428 | 741 | 1365 | 1334 | 735 | 1360 | 380 | 870 | 605 | 1311 | 1326 | 1340 | 3 | 157 | 49 | 1267 | 1198 |
for i, x in enumerate(df.dtypes):
if x == 'object':
print(pd.crosstab(index=df[df.columns[i]], columns='freq', dropna=False))
print('')
col_0 freq
MSZoning
C (all) 10
FV 65
RH 16
RL 1151
RM 218
col_0 freq
Street
Grvl 6
Pave 1454
col_0 freq
Alley
Grvl 50
Pave 41
col_0 freq
LotShape
IR1 484
IR2 41
IR3 10
Reg 925
col_0 freq
LandContour
Bnk 63
HLS 50
Low 36
Lvl 1311
col_0 freq
Utilities
AllPub 1459
NoSeWa 1
col_0 freq
LotConfig
Corner 263
CulDSac 94
FR2 47
FR3 4
Inside 1052
col_0 freq
LandSlope
Gtl 1382
Mod 65
Sev 13
col_0 freq
Neighborhood
Blmngtn 17
Blueste 2
BrDale 16
BrkSide 58
ClearCr 28
CollgCr 150
Crawfor 51
Edwards 100
Gilbert 79
IDOTRR 37
MeadowV 17
Mitchel 49
NAmes 225
NPkVill 9
NWAmes 73
NoRidge 41
NridgHt 77
OldTown 113
SWISU 25
Sawyer 74
SawyerW 59
Somerst 86
StoneBr 25
Timber 38
Veenker 11
col_0 freq
Condition1
Artery 48
Feedr 81
Norm 1260
PosA 8
PosN 19
RRAe 11
RRAn 26
RRNe 2
RRNn 5
col_0 freq
Condition2
Artery 2
Feedr 6
Norm 1445
PosA 1
PosN 2
RRAe 1
RRAn 1
RRNn 2
col_0 freq
BldgType
1Fam 1220
2fmCon 31
Duplex 52
Twnhs 43
TwnhsE 114
col_0 freq
HouseStyle
1.5Fin 154
1.5Unf 14
1Story 726
2.5Fin 8
2.5Unf 11
2Story 445
SFoyer 37
SLvl 65
col_0 freq
RoofStyle
Flat 13
Gable 1141
Gambrel 11
Hip 286
Mansard 7
Shed 2
col_0 freq
RoofMatl
ClyTile 1
CompShg 1434
Membran 1
Metal 1
Roll 1
Tar&Grv 11
WdShake 5
WdShngl 6
col_0 freq
Exterior1st
AsbShng 20
AsphShn 1
BrkComm 2
BrkFace 50
CBlock 1
CemntBd 61
HdBoard 222
ImStucc 1
MetalSd 220
Plywood 108
Stone 2
Stucco 25
VinylSd 515
Wd Sdng 206
WdShing 26
col_0 freq
Exterior2nd
AsbShng 20
AsphShn 3
Brk Cmn 7
BrkFace 25
CBlock 1
CmentBd 60
HdBoard 207
ImStucc 10
MetalSd 214
Other 1
Plywood 142
Stone 5
Stucco 26
VinylSd 504
Wd Sdng 197
Wd Shng 38
col_0 freq
MasVnrType
BrkCmn 15
BrkFace 445
None 864
Stone 128
col_0 freq
ExterQual
Ex 52
Fa 14
Gd 488
TA 906
col_0 freq
ExterCond
Ex 3
Fa 28
Gd 146
Po 1
TA 1282
col_0 freq
Foundation
BrkTil 146
CBlock 634
PConc 647
Slab 24
Stone 6
Wood 3
col_0 freq
BsmtQual
Ex 121
Fa 35
Gd 618
TA 649
col_0 freq
BsmtCond
Fa 45
Gd 65
Po 2
TA 1311
col_0 freq
BsmtExposure
Av 221
Gd 134
Mn 114
No 953
col_0 freq
BsmtFinType1
ALQ 220
BLQ 148
GLQ 418
LwQ 74
Rec 133
Unf 430
col_0 freq
BsmtFinType2
ALQ 19
BLQ 33
GLQ 14
LwQ 46
Rec 54
Unf 1256
col_0 freq
Heating
Floor 1
GasA 1428
GasW 18
Grav 7
OthW 2
Wall 4
col_0 freq
HeatingQC
Ex 741
Fa 49
Gd 241
Po 1
TA 428
col_0 freq
CentralAir
N 95
Y 1365
col_0 freq
Electrical
FuseA 94
FuseF 27
FuseP 3
Mix 1
SBrkr 1334
col_0 freq
KitchenQual
Ex 100
Fa 39
Gd 586
TA 735
col_0 freq
Functional
Maj1 14
Maj2 5
Min1 31
Min2 34
Mod 15
Sev 1
Typ 1360
col_0 freq
FireplaceQu
Ex 24
Fa 33
Gd 380
Po 20
TA 313
col_0 freq
GarageType
2Types 6
Attchd 870
Basment 19
BuiltIn 88
CarPort 9
Detchd 387
col_0 freq
GarageFinish
Fin 352
RFn 422
Unf 605
col_0 freq
GarageQual
Ex 3
Fa 48
Gd 14
Po 3
TA 1311
col_0 freq
GarageCond
Ex 2
Fa 35
Gd 9
Po 7
TA 1326
col_0 freq
PavedDrive
N 90
P 30
Y 1340
col_0 freq
PoolQC
Ex 2
Fa 2
Gd 3
col_0 freq
Fence
GdPrv 59
GdWo 54
MnPrv 157
MnWw 11
col_0 freq
MiscFeature
Gar2 2
Othr 2
Shed 49
TenC 1
col_0 freq
SaleType
COD 43
CWD 4
Con 2
ConLD 9
ConLI 5
ConLw 5
New 122
Oth 3
WD 1267
col_0 freq
SaleCondition
Abnorml 101
AdjLand 4
Alloca 12
Family 20
Normal 1198
Partial 125
1.2 Gráficos como: Gráficos de colunas, BoxPlot, dispersão (2,0)
Para gráficos numéricos plotamos boxplot e para categóricos gráficos de barra.
for i, x in enumerate(df.dtypes):
if x == 'int64' or x == 'float64':
plt.figure(i)
seaborn.boxplot(y = df[df.columns[i]])
elif x == 'int64' or x == 'object':
plt.figure(i)
seaborn.barplot(x = df[df.columns[i]], y = range(0,len(df)))
/Users/karinseeder/anaconda3/lib/python3.7/site-
packages/matplotlib/pyplot.py:514: RuntimeWarning: More than 20
figures have been opened. Figures created through the pyplot interface
(`matplotlib.pyplot.figure`) are retained until explicitly closed and
may consume too much memory. (To control this warning, see the rcParam
`figure.max_open_warning`).
max_open_warning, RuntimeWarning)
Gráficos de dispersão para dados numéricos
for i, x in enumerate(df.dtypes):
if x == 'float64':
plt.figure(i)
seaborn.scatterplot(x = df.SalePrice ,y = df[df.columns[i]])
2 - Análise de correlações (𝑟𝑥𝑖,𝑥𝑗)
2.1 Correlograma (1,5)
df_corr = df.corr()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(20, 12))
# mask
mask = np.triu(np.ones_like(df_corr, dtype=np.bool))
# adjust mask and df
mask = mask[1:, :-1]
corr = df_corr.iloc[1:,:-1].copy()
# plot heatmap
seaborn.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap='Blues',
vmin=-1, vmax=1, cbar_kws={"shrink": .8})
# yticks
plt.yticks(rotation=0)
plt.show()
2.2 Análise sobre correlações significativas (1,5)
Correlações com a variável resposta
df.corr().loc['SalePrice']
MSSubClass -0.084284
LotFrontage 0.351799
LotArea 0.263843
OverallQual 0.790982
OverallCond -0.077856
YearBuilt 0.522897
YearRemodAdd 0.507101
MasVnrArea 0.477493
BsmtFinSF1 0.386420
BsmtFinSF2 -0.011378
BsmtUnfSF 0.214479
TotalBsmtSF 0.613581
1stFlrSF 0.605852
2ndFlrSF 0.319334
LowQualFinSF -0.025606
GrLivArea 0.708624
BsmtFullBath 0.227122
BsmtHalfBath -0.016844
FullBath 0.560664
HalfBath 0.284108
BedroomAbvGr 0.168213
KitchenAbvGr -0.135907
TotRmsAbvGrd 0.533723
Fireplaces 0.466929
GarageYrBlt 0.486362
GarageCars 0.640409
GarageArea 0.623431
WoodDeckSF 0.324413
OpenPorchSF 0.315856
EnclosedPorch -0.128578
3SsnPorch 0.044584
ScreenPorch 0.111447
PoolArea 0.092404
MiscVal -0.021190
MoSold 0.046432
YrSold -0.028923
SalePrice 1.000000
Name: SalePrice, dtype: float64
Podemos notar que as variáveis com maior correlação são:
OverallQual 0.790982 - Faz todo sentido, dado que aqui é uma nota que dão para o imóvel
TotalBsmtSF 0.613581 - Talvez faça sentido o tamanho do sotão, pois existe a possibilidade de virar um espaço para alugar
1stFlrSF 0.605852 - Aqui quanto maior a metragem do primeiro andar, maior a área comum, logo um valor maior
GrLivArea 0.708624 - Idem ao de cima, tamanho maior da área externa de convivência
GarageCars 0.640409 - Quantidade de carro que cabem na garagem valorizam o imóvel
GarageArea 0.623431 - Muita correlação com o de cima (0.88)
3 - Desenvolvimento de modelo de Regressão utilizando Regressão Linear com o método de mínimos quadrados ordinários. Apresente as características do desenvolvimento: amostras, medidas de avaliação do modelo...
Primeiramente realizamos um tratamento nas variaveis categóricas e nos missings
categorical_data = ['MSSubClass','MSZoning','Street','Alley','LotShape','LandContour','Utilities','LotConfig',
'LandSlope','Neighborhood','Condition1','Condition2','BldgType','HouseStyle','RoofStyle','RoofMatl','Exterior1st',
'Exterior2nd','MasVnrType','ExterQual','ExterCond','Foundation','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1',
'BsmtFinType2','Heating','HeatingQC','CentralAir','Electrical','KitchenQual','Functional','FireplaceQu','GarageType',
'GarageFinish','GarageQual','GarageCond','PavedDrive','PoolQC','Fence','MiscFeature','SaleType','SaleCondition']
num_data = ['LotFrontage','LotArea','OverallQual','OverallCond','MasVnrArea','BsmtFinSF1','BsmtFinSF2','BsmtUnfSF',
'TotalBsmtSF','1stFlrSF','2ndFlrSF','LowQualFinSF','GrLivArea','BsmtFullBath','BsmtHalfBath','FullBath','HalfBath',
'BedroomAbvGr','KitchenAbvGr','TotRmsAbvGrd','Fireplaces','GarageCars','GarageArea','WoodDeckSF','OpenPorchSF',
'EnclosedPorch','3SsnPorch','ScreenPorch','PoolArea','MiscVal']
drop_data = ['id']
date_data = ['YearBuilt','YearRemodAdd','GarageYrBlt','MoSold','YrSold']
Y = df.SalePrice
X_cat_df = pd.get_dummies(df[categorical_data].fillna('NA'))
X_num_data = df[num_data].fillna(0)
df['garageTime'] = df.YrSold - df.GarageYrBlt
df['timeToSell'] = df.YrSold - df.YearBuilt
X = pd.concat([X_cat_df, X_num_data, df.garageTime.fillna(0), df.timeToSell.fillna(0)], axis = 1)
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
Aplicamos uma taxa de amostragem para teste de 30%, deixando 70% para treino
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)
rf = RandomForestRegressor(n_estimators=900,n_jobs=-1, max_depth=5)
rf.fit(X_train,y_train)
best_feat = list(pd.DataFrame(rf.feature_importances_, index=X_train.columns,columns=['Importance']).sort_values('Importance', ascending = False).index[0:20])
best_feat
['OverallQual',
'GrLivArea',
'2ndFlrSF',
'TotalBsmtSF',
'BsmtFinSF1',
'1stFlrSF',
'TotRmsAbvGrd',
'LotArea',
'GarageArea',
'FullBath',
'GarageCars',
'timeToSell',
'MasVnrArea',
'WoodDeckSF',
'LotFrontage',
'Fireplaces',
'OpenPorchSF',
'KitchenQual_Gd',
'GarageType_Detchd',
'BsmtUnfSF']
lr = LinearRegression()
lr.fit(X_train[best_feat],y_train)
print('Test R2 =', r2_score(y_test, lr.predict(X_test[best_feat])))
print('Test MAE =', mean_absolute_error(y_test, lr.predict(X_test[best_feat])))
print('Test RMSE =', np.sqrt(mean_squared_error(y_test, lr.predict(X_test[best_feat]))))
Test R2 = 0.8227891508565189
Test MAE = 22028.619292218595
Test RMSE = 29760.688359672662
print('Train R2 =', r2_score(y_train, lr.predict(X_train[best_feat])))
print('Train MAE =', mean_absolute_error(y_train, lr.predict(X_train[best_feat])))
print('Train RMSE =', np.sqrt(mean_squared_error(y_train, lr.predict(X_train[best_feat]))))
Train R2 = 0.7783998968515039
Train MAE = 23762.54690816806
Train RMSE = 38986.12363092791
import statsmodels.api as sm
from scipy import stats
X2_train = sm.add_constant(X_train)
est = sm.OLS(y_train, X2_train)
est2 = est.fit()
print(est2.summary())
seaborn.residplot(lr.predict(X_test[best_feat]),y_test, lowess=True,
line_kws={'color': 'red', 'lw': 1, 'alpha': 1})
plt.xlabel("Fitted values")
plt.title('Residual plot')
OLS Regression Results
==============================================================================
Dep. Variable: SalePrice R-squared:
0.940
Model: OLS Adj. R-squared:
0.921
Method: Least Squares F-statistic:
49.99
Date: Wed, 28 Jul 2021 Prob (F-statistic):
0.00
Time: 00:41:49 Log-Likelihood:
-11590.
No. Observations: 1022 AIC:
2.367e+04
Df Residuals: 779 BIC:
2.486e+04
Df Model: 242
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t|
[0.025 0.975]
-----------------------------------------------------------------------------------------
const -2.458e+05 3.91e+04 -6.285 0.000
-3.23e+05 -1.69e+05
MSSubClass -128.1975 102.561 -1.250 0.212
-329.526 73.131
MSZoning_C (all) -7.317e+04 1.28e+04 -5.695 0.000
-9.84e+04 -4.8e+04
MSZoning_FV -2.955e+04 1.11e+04 -2.672 0.008
-5.13e+04 -7838.915
MSZoning_RH -4.981e+04 1.15e+04 -4.328 0.000
-7.24e+04 -2.72e+04
MSZoning_RL -4.689e+04 8769.215 -5.347 0.000
-6.41e+04 -2.97e+04
MSZoning_RM -4.639e+04 9210.219 -5.037 0.000
-6.45e+04 -2.83e+04
Street_Grvl -1.42e+05 2.1e+04 -6.770 0.000
-1.83e+05 -1.01e+05
Street_Pave -1.038e+05 2.05e+04 -5.059 0.000
-1.44e+05 -6.35e+04
Alley_Grvl -8.44e+04 1.37e+04 -6.158 0.000
-1.11e+05 -5.75e+04
Alley_NA -8.26e+04 1.32e+04 -6.258 0.000
-1.09e+05 -5.67e+04
Alley_Pave -7.882e+04 1.38e+04 -5.716 0.000
-1.06e+05 -5.17e+04
LotShape_IR1 -6.725e+04 1.07e+04 -6.274 0.000
-8.83e+04 -4.62e+04
LotShape_IR2 -5.643e+04 1.14e+04 -4.939 0.000
-7.89e+04 -3.4e+04
LotShape_IR3 -5.685e+04 1.59e+04 -3.576 0.000
-8.81e+04 -2.56e+04
LotShape_Reg -6.528e+04 1.07e+04 -6.123 0.000
-8.62e+04 -4.44e+04
LandContour_Bnk -6.456e+04 1.06e+04 -6.103 0.000
-8.53e+04 -4.38e+04
LandContour_HLS -5.719e+04 1.05e+04 -5.426 0.000
-7.79e+04 -3.65e+04
LandContour_Low -7.236e+04 1.11e+04 -6.510 0.000
-9.42e+04 -5.05e+04
LandContour_Lvl -5.17e+04 1.02e+04 -5.067 0.000
-7.17e+04 -3.17e+04
Utilities_AllPub -9.693e+04 2.38e+04 -4.077 0.000
-1.44e+05 -5.03e+04
Utilities_NoSeWa -1.489e+05 2.49e+04 -5.970 0.000
-1.98e+05 -9.99e+04
LotConfig_Corner -4.482e+04 8533.118 -5.252 0.000
-6.16e+04 -2.81e+04
LotConfig_CulDSac -3.386e+04 8950.467 -3.783 0.000
-5.14e+04 -1.63e+04
LotConfig_FR2 -5.579e+04 9175.010 -6.080 0.000
-7.38e+04 -3.78e+04
LotConfig_FR3 -6.288e+04 1.3e+04 -4.830 0.000
-8.84e+04 -3.73e+04
LotConfig_Inside -4.846e+04 8449.359 -5.735 0.000
-6.5e+04 -3.19e+04
LandSlope_Gtl -7.52e+04 1.38e+04 -5.447 0.000
-1.02e+05 -4.81e+04
LandSlope_Mod -6.605e+04 1.38e+04 -4.791 0.000
-9.31e+04 -3.9e+04
LandSlope_Sev -1.046e+05 1.59e+04 -6.567 0.000
-1.36e+05 -7.33e+04
Neighborhood_Blmngtn 503.6343 8759.533 0.057 0.954
-1.67e+04 1.77e+04
Neighborhood_Blueste 994.6400 2.69e+04 0.037 0.970
-5.17e+04 5.37e+04
Neighborhood_BrDale -1.242e+04 1.02e+04 -1.220 0.223
-3.24e+04 7564.508
Neighborhood_BrkSide -1.67e+04 6322.922 -2.641 0.008
-2.91e+04 -4289.422
Neighborhood_ClearCr -2.092e+04 7565.303 -2.766 0.006
-3.58e+04 -6074.143
Neighborhood_CollgCr -1.652e+04 4088.793 -4.040 0.000
-2.45e+04 -8490.836
Neighborhood_Crawfor 3990.1010 5903.445 0.676 0.499
-7598.444 1.56e+04
Neighborhood_Edwards -2.906e+04 4436.518 -6.550 0.000
-3.78e+04 -2.04e+04
Neighborhood_Gilbert -1.828e+04 5380.529 -3.397 0.001
-2.88e+04 -7714.213
Neighborhood_IDOTRR -2.234e+04 8649.592 -2.582 0.010
-3.93e+04 -5357.224
Neighborhood_MeadowV -2.221e+04 9740.665 -2.280 0.023
-4.13e+04 -3087.845
Neighborhood_Mitchel -2.981e+04 5549.421 -5.371 0.000
-4.07e+04 -1.89e+04
Neighborhood_NAmes -2.345e+04 3986.964 -5.881 0.000
-3.13e+04 -1.56e+04
Neighborhood_NPkVill 1.348e+04 1.35e+04 1.000 0.317
-1.3e+04 3.99e+04
Neighborhood_NWAmes -2.499e+04 4789.095 -5.218 0.000
-3.44e+04 -1.56e+04
Neighborhood_NoRidge 2.581e+04 6895.206 3.743 0.000
1.23e+04 3.93e+04
Neighborhood_NridgHt 1.536e+04 5423.660 2.832 0.005
4715.497 2.6e+04
Neighborhood_OldTown -2.656e+04 6595.916 -4.027 0.000
-3.95e+04 -1.36e+04
Neighborhood_SWISU -1.369e+04 7905.098 -1.732 0.084
-2.92e+04 1826.610
Neighborhood_Sawyer -1.866e+04 4741.836 -3.934 0.000
-2.8e+04 -9347.819
Neighborhood_SawyerW -7558.4985 5260.944 -1.437 0.151
-1.79e+04 2768.808
Neighborhood_Somerst -1.656e+04 7769.193 -2.131 0.033
-3.18e+04 -1308.686
Neighborhood_StoneBr 3.12e+04 7671.009 4.068 0.000
1.61e+04 4.63e+04
Neighborhood_Timber -1.763e+04 5896.456 -2.989 0.003
-2.92e+04 -6050.289
Neighborhood_Veenker 187.7040 1e+04 0.019 0.985
-1.94e+04 1.98e+04
Condition1_Artery -3.617e+04 7283.289 -4.966 0.000
-5.05e+04 -2.19e+04
Condition1_Feedr -3.268e+04 6784.319 -4.817 0.000
-4.6e+04 -1.94e+04
Condition1_Norm -2.089e+04 5991.325 -3.487 0.001
-3.27e+04 -9132.213
Condition1_PosA -1.131e+04 1.27e+04 -0.887 0.375
-3.63e+04 1.37e+04
Condition1_PosN -2.537e+04 9195.602 -2.759 0.006
-4.34e+04 -7323.669
Condition1_RRAe -4.851e+04 1.02e+04 -4.770 0.000
-6.85e+04 -2.85e+04
Condition1_RRAn -2.127e+04 8210.850 -2.591 0.010
-3.74e+04 -5155.595
Condition1_RRNe -3.288e+04 2.23e+04 -1.477 0.140
-7.66e+04 1.08e+04
Condition1_RRNn -1.672e+04 1.27e+04 -1.312 0.190
-4.17e+04 8296.937
Condition2_Artery 5439.7556 2.15e+04 0.253 0.801
-3.68e+04 4.77e+04
Condition2_Feedr -2817.6831 1.7e+04 -0.166 0.868
-3.61e+04 3.05e+04
Condition2_Norm -7947.9602 1.17e+04 -0.681 0.496
-3.09e+04 1.5e+04
Condition2_PosA 3.478e+04 3.63e+04 0.957 0.339
-3.66e+04 1.06e+05
Condition2_PosN -2.349e+05 2.07e+04 -11.353 0.000
-2.76e+05 -1.94e+05
Condition2_RRAe -4.461e+04 2.64e+04 -1.691 0.091
-9.64e+04 7171.173
Condition2_RRAn 8.442e-08 1.45e-08 5.836 0.000
5.6e-08 1.13e-07
Condition2_RRNn 4240.8154 2.55e+04 0.166 0.868
-4.59e+04 5.44e+04
BldgType_1Fam -4.301e+04 1.15e+04 -3.755 0.000
-6.55e+04 -2.05e+04
BldgType_2fmCon -3.699e+04 1.15e+04 -3.227 0.001
-5.95e+04 -1.45e+04
BldgType_Duplex -5.068e+04 1.07e+04 -4.740 0.000
-7.17e+04 -2.97e+04
BldgType_Twnhs -5.84e+04 1.03e+04 -5.675 0.000
-7.86e+04 -3.82e+04
BldgType_TwnhsE -5.673e+04 9490.127 -5.978 0.000
-7.54e+04 -3.81e+04
HouseStyle_1.5Fin -3.056e+04 6205.033 -4.925 0.000
-4.27e+04 -1.84e+04
HouseStyle_1.5Unf -2.14e+04 9626.175 -2.223 0.027
-4.03e+04 -2500.013
HouseStyle_1Story -2.623e+04 7229.351 -3.628 0.000
-4.04e+04 -1.2e+04
HouseStyle_2.5Fin -4.28e+04 1.61e+04 -2.654 0.008
-7.45e+04 -1.11e+04
HouseStyle_2.5Unf -3.862e+04 1.16e+04 -3.330 0.001
-6.14e+04 -1.59e+04
HouseStyle_2Story -3.558e+04 6084.177 -5.848 0.000
-4.75e+04 -2.36e+04
HouseStyle_SFoyer -2.702e+04 8644.411 -3.126 0.002
-4.4e+04 -1.01e+04
HouseStyle_SLvl -2.36e+04 7481.980 -3.154 0.002
-3.83e+04 -8913.893
RoofStyle_Flat -3.968e+04 1.91e+04 -2.078 0.038
-7.72e+04 -2189.825
RoofStyle_Gable -4.575e+04 1.02e+04 -4.473 0.000
-6.58e+04 -2.57e+04
RoofStyle_Gambrel -3.816e+04 1.39e+04 -2.737 0.006
-6.55e+04 -1.08e+04
RoofStyle_Hip -4.577e+04 1.04e+04 -4.402 0.000
-6.62e+04 -2.54e+04
RoofStyle_Mansard -3.185e+04 1.38e+04 -2.306 0.021
-5.9e+04 -4737.162
RoofStyle_Shed -4.461e+04 2.64e+04 -1.691 0.091
-9.64e+04 7171.173
RoofMatl_ClyTile -5.3e+05 4.56e+04 -11.623 0.000
-6.19e+05 -4.4e+05
RoofMatl_CompShg 2.662e+04 1.23e+04 2.161 0.031
2435.071 5.08e+04
RoofMatl_Membran 8.77e+04 2.85e+04 3.075 0.002
3.17e+04 1.44e+05
RoofMatl_Metal 5.871e+04 2.67e+04 2.199 0.028
6298.432 1.11e+05
RoofMatl_Roll -6.665e-09 1.16e-09 -5.732 0.000
-8.95e-09 -4.38e-09
RoofMatl_Tar&Grv 2.162e+04 1.49e+04 1.453 0.147
-7589.703 5.08e+04
RoofMatl_WdShake 5275.4932 2.49e+04 0.212 0.832
-4.36e+04 5.42e+04
RoofMatl_WdShngl 8.422e+04 1.58e+04 5.339 0.000
5.33e+04 1.15e+05
Exterior1st_AsbShng -8110.0658 1.51e+04 -0.539 0.590
-3.77e+04 2.14e+04
Exterior1st_AsphShn -2.597e-09 4.46e-10 -5.829 0.000
-3.47e-09 -1.72e-09
Exterior1st_BrkComm -2.949e+04 3.34e+04 -0.882 0.378
-9.51e+04 3.61e+04
Exterior1st_BrkFace -5247.8773 8066.045 -0.651 0.515
-2.11e+04 1.06e+04
Exterior1st_CBlock -1.949e+04 1.38e+04 -1.409 0.159
-4.66e+04 7654.347
Exterior1st_CemntBd -2.634e+04 1.96e+04 -1.347 0.178
-6.47e+04 1.2e+04
Exterior1st_HdBoard -2.245e+04 7495.308 -2.996 0.003
-3.72e+04 -7740.751
Exterior1st_ImStucc -1.507e+04 2.67e+04 -0.565 0.572
-6.75e+04 3.73e+04
Exterior1st_MetalSd -1.609e+04 1.12e+04 -1.436 0.151
-3.81e+04 5899.949
Exterior1st_Plywood -2.305e+04 7708.290 -2.991 0.003
-3.82e+04 -7923.336
Exterior1st_Stone 3339.4870 2.57e+04 0.130 0.896
-4.7e+04 5.37e+04
Exterior1st_Stucco -1.795e+04 1.23e+04 -1.458 0.145
-4.21e+04 6212.565
Exterior1st_VinylSd -2.064e+04 9321.258 -2.214 0.027
-3.89e+04 -2337.853
Exterior1st_Wd Sdng -2.712e+04 7289.028 -3.721 0.000
-4.14e+04 -1.28e+04
Exterior1st_WdShing -1.809e+04 9048.469 -1.999 0.046
-3.59e+04 -328.050
Exterior2nd_AsbShng -2.271e+04 1.45e+04 -1.569 0.117
-5.11e+04 5704.885
Exterior2nd_AsphShn -6549.5554 1.98e+04 -0.332 0.740
-4.53e+04 3.22e+04
Exterior2nd_Brk Cmn -2.21e+04 2.08e+04 -1.065 0.287
-6.28e+04 1.86e+04
Exterior2nd_BrkFace -1.42e+04 9028.069 -1.573 0.116
-3.19e+04 3524.880
Exterior2nd_CBlock -1.949e+04 1.38e+04 -1.409 0.159
-4.66e+04 7654.347
Exterior2nd_CmentBd 2672.3978 1.96e+04 0.136 0.892
-3.58e+04 4.12e+04
Exterior2nd_HdBoard -1.011e+04 6643.482 -1.521 0.129
-2.31e+04 2933.243
Exterior2nd_ImStucc -2.242e+04 1.29e+04 -1.745 0.081
-4.77e+04 2806.727
Exterior2nd_MetalSd -1.378e+04 1.05e+04 -1.307 0.192
-3.45e+04 6915.612
Exterior2nd_Other -4.103e+04 2.53e+04 -1.624 0.105
-9.06e+04 8573.829
Exterior2nd_Plywood -1.236e+04 6313.940 -1.957 0.051
-2.48e+04 35.822
Exterior2nd_Stone -2.918e+04 1.8e+04 -1.620 0.106
-6.45e+04 6174.330
Exterior2nd_Stucco -5630.2784 1.19e+04 -0.472 0.637
-2.9e+04 1.78e+04
Exterior2nd_VinylSd -1.006e+04 8374.636 -1.201 0.230
-2.65e+04 6383.563
Exterior2nd_Wd Sdng -2313.6978 6155.071 -0.376 0.707
-1.44e+04 9768.792
Exterior2nd_Wd Shng -1.656e+04 7620.241 -2.174 0.030
-3.15e+04 -1605.825
MasVnrType_BrkCmn -5.181e+04 1.07e+04 -4.849 0.000
-7.28e+04 -3.08e+04
MasVnrType_BrkFace -4.794e+04 8385.465 -5.717 0.000
-6.44e+04 -3.15e+04
MasVnrType_NA -5.985e+04 1.17e+04 -5.106 0.000
-8.29e+04 -3.68e+04
MasVnrType_None -4.586e+04 8420.265 -5.447 0.000
-6.24e+04 -2.93e+04
MasVnrType_Stone -4.036e+04 8588.088 -4.699 0.000
-5.72e+04 -2.35e+04
ExterQual_Ex -4.123e+04 1.1e+04 -3.740 0.000
-6.29e+04 -1.96e+04
ExterQual_Fa -6.169e+04 1.34e+04 -4.610 0.000
-8.8e+04 -3.54e+04
ExterQual_Gd -7.239e+04 1.05e+04 -6.899 0.000
-9.3e+04 -5.18e+04
ExterQual_TA -7.05e+04 1.06e+04 -6.636 0.000
-9.14e+04 -4.96e+04
ExterCond_Ex -5.448e+04 2.46e+04 -2.212 0.027
-1.03e+05 -6141.822
ExterCond_Fa -5.19e+04 1.27e+04 -4.085 0.000
-7.68e+04 -2.7e+04
ExterCond_Gd -5.102e+04 1.15e+04 -4.454 0.000
-7.35e+04 -2.85e+04
ExterCond_Po -3.775e+04 2.61e+04 -1.447 0.148
-8.9e+04 1.35e+04
ExterCond_TA -5.066e+04 1.14e+04 -4.456 0.000
-7.3e+04 -2.83e+04
Foundation_BrkTil -4.047e+04 8291.313 -4.881 0.000
-5.67e+04 -2.42e+04
Foundation_CBlock -3.47e+04 7883.439 -4.401 0.000
-5.02e+04 -1.92e+04
Foundation_PConc -3.178e+04 7751.205 -4.100 0.000
-4.7e+04 -1.66e+04
Foundation_Slab -4.344e+04 1.42e+04 -3.058 0.002
-7.13e+04 -1.56e+04
Foundation_Stone -3.233e+04 1.33e+04 -2.437 0.015
-5.84e+04 -6285.922
Foundation_Wood -6.31e+04 1.47e+04 -4.293 0.000
-9.2e+04 -3.42e+04
BsmtQual_Ex -4.305e+04 9353.751 -4.602 0.000
-6.14e+04 -2.47e+04
BsmtQual_Fa -5.747e+04 9616.278 -5.976 0.000
-7.63e+04 -3.86e+04
BsmtQual_Gd -5.705e+04 8426.116 -6.771 0.000
-7.36e+04 -4.05e+04
BsmtQual_NA -3.311e+04 1.35e+04 -2.458 0.014
-5.96e+04 -6672.976
BsmtQual_TA -5.513e+04 8441.790 -6.531 0.000
-7.17e+04 -3.86e+04
BsmtCond_Fa -7.017e+04 1.17e+04 -5.998 0.000
-9.31e+04 -4.72e+04
BsmtCond_Gd -7.413e+04 1.13e+04 -6.552 0.000
-9.63e+04 -5.19e+04
BsmtCond_NA -3.311e+04 1.35e+04 -2.458 0.014
-5.96e+04 -6672.976
BsmtCond_Po 1.8e-12 3.37e-11 0.053 0.957
-6.43e-11 6.79e-11
BsmtCond_TA -6.84e+04 1.11e+04 -6.171 0.000
-9.02e+04 -4.66e+04
BsmtExposure_Av -4.876e+04 9414.392 -5.180 0.000
-6.72e+04 -3.03e+04
BsmtExposure_Gd -3.283e+04 9447.762 -3.475 0.001
-5.14e+04 -1.43e+04
BsmtExposure_Mn -5.225e+04 9561.799 -5.464 0.000
-7.1e+04 -3.35e+04
BsmtExposure_NA -5.705e+04 2.06e+04 -2.770 0.006
-9.75e+04 -1.66e+04
BsmtExposure_No -5.492e+04 9266.277 -5.927 0.000
-7.31e+04 -3.67e+04
BsmtFinType1_ALQ -3.758e+04 5869.672 -6.402 0.000
-4.91e+04 -2.61e+04
BsmtFinType1_BLQ -3.474e+04 6126.014 -5.672 0.000
-4.68e+04 -2.27e+04
BsmtFinType1_GLQ -2.95e+04 5871.092 -5.025 0.000
-4.1e+04 -1.8e+04
BsmtFinType1_LwQ -4.168e+04 6430.896 -6.482 0.000
-5.43e+04 -2.91e+04
BsmtFinType1_NA -3.311e+04 1.35e+04 -2.458 0.014
-5.96e+04 -6672.976
BsmtFinType1_Rec -3.578e+04 6246.603 -5.728 0.000
-4.8e+04 -2.35e+04
BsmtFinType1_Unf -3.341e+04 5893.369 -5.670 0.000
-4.5e+04 -2.18e+04
BsmtFinType2_ALQ -2.573e+04 9501.526 -2.708 0.007
-4.44e+04 -7078.853
BsmtFinType2_BLQ -3.711e+04 8668.137 -4.281 0.000
-5.41e+04 -2.01e+04
BsmtFinType2_GLQ -2.313e+04 1.09e+04 -2.127 0.034
-4.45e+04 -1778.841
BsmtFinType2_LwQ -3.209e+04 7830.357 -4.099 0.000
-4.75e+04 -1.67e+04
BsmtFinType2_NA -6.345e+04 2.29e+04 -2.766 0.006
-1.08e+05 -1.84e+04
BsmtFinType2_Rec -3.472e+04 8233.591 -4.217 0.000
-5.09e+04 -1.86e+04
BsmtFinType2_Unf -2.957e+04 8178.636 -3.615 0.000
-4.56e+04 -1.35e+04
Heating_Floor -3.894e+04 2.55e+04 -1.529 0.127
-8.89e+04 1.1e+04
Heating_GasA -3.784e+04 1.02e+04 -3.694 0.000
-5.79e+04 -1.77e+04
Heating_GasW -4.284e+04 1.29e+04 -3.316 0.001
-6.82e+04 -1.75e+04
Heating_Grav -4.316e+04 1.62e+04 -2.661 0.008
-7.5e+04 -1.13e+04
Heating_OthW -6.027e+04 1.95e+04 -3.098 0.002
-9.85e+04 -2.21e+04
Heating_Wall -2.276e+04 1.74e+04 -1.311 0.190
-5.68e+04 1.13e+04
HeatingQC_Ex -4.682e+04 9965.706 -4.698 0.000
-6.64e+04 -2.73e+04
HeatingQC_Fa -5.002e+04 1.09e+04 -4.576 0.000
-7.15e+04 -2.86e+04
HeatingQC_Gd -5.013e+04 9896.970 -5.066 0.000
-6.96e+04 -3.07e+04
HeatingQC_Po -4.834e+04 2.46e+04 -1.964 0.050
-9.67e+04 -22.908
HeatingQC_TA -5.05e+04 9968.064 -5.066 0.000
-7.01e+04 -3.09e+04
CentralAir_N -1.231e+05 1.99e+04 -6.193 0.000
-1.62e+05 -8.41e+04
CentralAir_Y -1.227e+05 1.96e+04 -6.260 0.000
-1.61e+05 -8.42e+04
Electrical_FuseA -6.364e+04 1.19e+04 -5.367 0.000
-8.69e+04 -4.04e+04
Electrical_FuseF -6.411e+04 1.29e+04 -4.960 0.000
-8.95e+04 -3.87e+04
Electrical_FuseP -5.289e+04 1.99e+04 -2.656 0.008
-9.2e+04 -1.38e+04
Electrical_Mix 9.672e-12 1.56e-11 0.621 0.535
-2.09e-11 4.03e-11
Electrical_NA -8.057e-11 1.39e-11 -5.776 0.000
-1.08e-10 -5.32e-11
Electrical_SBrkr -6.517e+04 1.17e+04 -5.584 0.000
-8.81e+04 -4.23e+04
KitchenQual_Ex -4.7e+04 1.09e+04 -4.323 0.000
-6.83e+04 -2.57e+04
KitchenQual_Fa -6.657e+04 1.07e+04 -6.219 0.000
-8.76e+04 -4.56e+04
KitchenQual_Gd -6.617e+04 9946.723 -6.652 0.000
-8.57e+04 -4.66e+04
KitchenQual_TA -6.608e+04 9958.286 -6.636 0.000
-8.56e+04 -4.65e+04
Functional_Maj1 -3.453e+04 1.15e+04 -3.000 0.003
-5.71e+04 -1.19e+04
Functional_Maj2 -5.54e+04 2.39e+04 -2.316 0.021
-1.02e+05 -8438.266
Functional_Min1 -1.489e+04 9543.707 -1.560 0.119
-3.36e+04 3845.523
Functional_Min2 -2.268e+04 9816.878 -2.310 0.021
-4.19e+04 -3404.566
Functional_Mod -2.845e+04 1.23e+04 -2.322 0.021
-5.25e+04 -4395.995
Functional_Sev -7.75e+04 2.9e+04 -2.672 0.008
-1.34e+05 -2.06e+04
Functional_Typ -1.236e+04 8599.272 -1.438 0.151
-2.92e+04 4518.573
FireplaceQu_Ex -4.094e+04 7720.578 -5.302 0.000
-5.61e+04 -2.58e+04
FireplaceQu_Fa -4.94e+04 8662.483 -5.702 0.000
-6.64e+04 -3.24e+04
FireplaceQu_Gd -4.081e+04 7124.232 -5.729 0.000
-5.48e+04 -2.68e+04
FireplaceQu_NA -3.701e+04 8093.018 -4.573 0.000
-5.29e+04 -2.11e+04
FireplaceQu_Po -3.609e+04 9926.115 -3.636 0.000
-5.56e+04 -1.66e+04
FireplaceQu_TA -4.157e+04 7307.789 -5.688 0.000
-5.59e+04 -2.72e+04
GarageType_2Types -4.7e+04 1.28e+04 -3.661 0.000
-7.22e+04 -2.18e+04
GarageType_Attchd -3.617e+04 6284.873 -5.755 0.000
-4.85e+04 -2.38e+04
GarageType_Basment -2.353e+04 9574.261 -2.458 0.014
-4.23e+04 -4736.376
GarageType_BuiltIn -3.574e+04 7178.072 -4.979 0.000
-4.98e+04 -2.16e+04
GarageType_CarPort -2.831e+04 1.08e+04 -2.619 0.009
-4.95e+04 -7092.205
GarageType_Detchd -3.416e+04 6378.969 -5.355 0.000
-4.67e+04 -2.16e+04
GarageType_NA -4.091e+04 7444.073 -5.496 0.000
-5.55e+04 -2.63e+04
GarageFinish_Fin -6.767e+04 1.08e+04 -6.276 0.000
-8.88e+04 -4.65e+04
GarageFinish_NA -4.091e+04 7444.073 -5.496 0.000
-5.55e+04 -2.63e+04
GarageFinish_RFn -6.989e+04 1.08e+04 -6.463 0.000
-9.11e+04 -4.87e+04
GarageFinish_Unf -6.733e+04 1.08e+04 -6.225 0.000
-8.86e+04 -4.61e+04
GarageQual_Ex 4.496e+04 2.86e+04 1.575 0.116
-1.11e+04 1.01e+05
GarageQual_Fa -5.738e+04 1.11e+04 -5.167 0.000
-7.92e+04 -3.56e+04
GarageQual_Gd -4.838e+04 1.31e+04 -3.705 0.000
-7.4e+04 -2.27e+04
GarageQual_NA -4.091e+04 7444.073 -5.496 0.000
-5.55e+04 -2.63e+04
GarageQual_Po -8.699e+04 2.4e+04 -3.631 0.000
-1.34e+05 -4e+04
GarageQual_TA -5.712e+04 1.09e+04 -5.219 0.000
-7.86e+04 -3.56e+04
GarageCond_Ex -1.249e+05 3.43e+04 -3.643 0.000
-1.92e+05 -5.76e+04
GarageCond_Fa -2.359e+04 1.21e+04 -1.955 0.051
-4.73e+04 101.854
GarageCond_Gd -3.256e+04 1.46e+04 -2.227 0.026
-6.13e+04 -3861.166
GarageCond_NA -4.091e+04 7444.073 -5.496 0.000
-5.55e+04 -2.63e+04
GarageCond_Po -5263.4624 1.77e+04 -0.297 0.767
-4.01e+04 2.95e+04
GarageCond_TA -1.857e+04 1.13e+04 -1.647 0.100
-4.07e+04 3563.643
PavedDrive_N -8.098e+04 1.34e+04 -6.058 0.000
-1.07e+05 -5.47e+04
PavedDrive_P -8.179e+04 1.39e+04 -5.893 0.000
-1.09e+05 -5.45e+04
PavedDrive_Y -8.304e+04 1.33e+04 -6.244 0.000
-1.09e+05 -5.69e+04
PoolQC_Ex -6.747e+05 1.43e+05 -4.724 0.000
-9.55e+05 -3.94e+05
PoolQC_Fa -1.426e+06 2.57e+05 -5.554 0.000
-1.93e+06 -9.22e+05
PoolQC_Gd -5.3e+05 4.56e+04 -11.623 0.000
-6.19e+05 -4.4e+05
PoolQC_NA 2.385e+06 4.01e+05 5.947 0.000
1.6e+06 3.17e+06
Fence_GdPrv -5.63e+04 8495.030 -6.628 0.000
-7.3e+04 -3.96e+04
Fence_GdWo -4.766e+04 8931.650 -5.336 0.000
-6.52e+04 -3.01e+04
Fence_MnPrv -4.601e+04 8638.976 -5.326 0.000
-6.3e+04 -2.91e+04
Fence_MnWw -5.044e+04 1.05e+04 -4.801 0.000
-7.11e+04 -2.98e+04
Fence_NA -4.539e+04 8293.653 -5.473 0.000
-6.17e+04 -2.91e+04
MiscFeature_Gar2 -1.693e+05 9.02e+04 -1.876 0.061
-3.46e+05 7807.915
MiscFeature_NA -1.946e+05 4.32e+04 -4.507 0.000
-2.79e+05 -1.1e+05
MiscFeature_Othr -1.898e+05 4.25e+04 -4.469 0.000
-2.73e+05 -1.06e+05
MiscFeature_Shed -1.91e+05 4.05e+04 -4.713 0.000
-2.71e+05 -1.11e+05
MiscFeature_TenC 4.989e+05 1.04e+05 4.794 0.000
2.95e+05 7.03e+05
SaleType_COD -3.041e+04 8440.444 -3.603 0.000
-4.7e+04 -1.38e+04
SaleType_CWD -2.588e+04 1.74e+04 -1.491 0.136
-6e+04 8191.003
SaleType_Con -9857.2627 1.74e+04 -0.567 0.571
-4.4e+04 2.43e+04
SaleType_ConLD 440.8491 1.54e+04 0.029 0.977
-2.98e+04 3.07e+04
SaleType_ConLI -3.587e+04 1.51e+04 -2.378 0.018
-6.55e+04 -6262.081
SaleType_ConLw -3.837e+04 2.3e+04 -1.665 0.096
-8.36e+04 6875.872
SaleType_New -4.578e+04 2.35e+04 -1.950 0.052
-9.19e+04 308.804
SaleType_Oth -2.915e+04 1.5e+04 -1.939 0.053
-5.87e+04 366.832
SaleType_WD -3.093e+04 7181.626 -4.307 0.000
-4.5e+04 -1.68e+04
SaleCondition_Abnorml -4.656e+04 9507.697 -4.897 0.000
-6.52e+04 -2.79e+04
SaleCondition_AdjLand -6.591e+04 2.39e+04 -2.753 0.006
-1.13e+05 -1.89e+04
SaleCondition_Alloca -4.087e+04 1.26e+04 -3.238 0.001
-6.56e+04 -1.61e+04
SaleCondition_Family -4.343e+04 1.14e+04 -3.824 0.000
-6.57e+04 -2.11e+04
SaleCondition_Normal -3.96e+04 9101.143 -4.351 0.000
-5.75e+04 -2.17e+04
SaleCondition_Partial -9438.1306 2.23e+04 -0.424 0.672
-5.32e+04 3.43e+04
LotFrontage 8.6884 29.049 0.299 0.765
-48.335 65.712
LotArea 0.6318 0.152 4.166 0.000
0.334 0.929
OverallQual 7022.2157 1310.943 5.357 0.000
4448.817 9595.614
OverallCond 5754.5954 1060.549 5.426 0.000
3672.723 7836.468
MasVnrArea 19.1000 7.217 2.646 0.008
4.933 33.267
BsmtFinSF1 16.0409 3.597 4.459 0.000
8.979 23.102
BsmtFinSF2 12.3366 7.246 1.702 0.089
-1.888 26.561
BsmtUnfSF -2.4881 3.498 -0.711 0.477
-9.354 4.378
TotalBsmtSF 25.8892 5.117 5.060 0.000
15.845 35.934
1stFlrSF 5.0059 8.537 0.586 0.558
-11.752 21.764
2ndFlrSF 31.6424 7.504 4.217 0.000
16.911 46.373
LowQualFinSF -8.4486 18.824 -0.449 0.654
-45.400 28.503
GrLivArea 28.1997 7.530 3.745 0.000
13.418 42.982
BsmtFullBath 1293.5230 2641.605 0.490 0.625
-3891.985 6479.031
BsmtHalfBath -2004.8464 4010.119 -0.500 0.617
-9876.766 5867.073
FullBath 3573.8997 2852.252 1.253 0.211
-2025.110 9172.910
HalfBath 758.8515 2685.205 0.283 0.778
-4512.242 6029.945
BedroomAbvGr -2490.7808 1767.234 -1.409 0.159
-5959.885 978.324
KitchenAbvGr -1.199e+04 7732.312 -1.551 0.121
-2.72e+04 3186.391
TotRmsAbvGrd 2499.3470 1220.422 2.048 0.041
103.641 4895.053
Fireplaces 6143.7531 3391.361 1.812 0.070
-513.535 1.28e+04
GarageCars 2058.0214 2936.204 0.701 0.484
-3705.787 7821.830
GarageArea 27.5991 9.899 2.788 0.005
8.168 47.030
WoodDeckSF 15.1366 7.414 2.042 0.042
0.582 29.691
OpenPorchSF 5.8240 14.460 0.403 0.687
-22.561 34.209
EnclosedPorch 10.2642 16.089 0.638 0.524
-21.319 41.848
3SsnPorch 39.9566 28.217 1.416 0.157
-15.434 95.348
ScreenPorch 37.6018 15.577 2.414 0.016
7.023 68.180
PoolArea 5948.8991 1011.891 5.879 0.000
3962.543 7935.256
MiscVal -1.7405 6.860 -0.254 0.800
-15.206 11.725
garageTime -7.7199 76.955 -0.100 0.920
-158.784 143.344
timeToSell -297.0211 96.584 -3.075 0.002
-486.618 -107.425
==============================================================================
Omnibus: 312.810 Durbin-Watson:
1.903
Prob(Omnibus): 0.000 Jarque-Bera (JB):
12911.124
Skew: 0.654 Prob(JB):
0.00
Kurtosis: 20.363 Cond. No.
7.49e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The smallest eigenvalue is 3.83e-23. This might indicate that
there are
strong multicollinearity problems or that the design matrix is
singular.
Text(0.5, 1.0, 'Residual plot')
residuals = y_test - lr.predict(X_test[best_feat])
residuals
plt.figure(figsize=(7,7))
stats.probplot(residuals, dist="norm", plot=plt)
plt.title("Normal Q-Q Plot")
Text(0.5, 1.0, 'Normal Q-Q Plot')
model_norm_residuals_abs_sqrt=np.sqrt(np.abs(residuals))
plt.figure(figsize=(7,7))
seaborn.regplot(lr.predict(X_test[best_feat]), model_norm_residuals_abs_sqrt,
scatter=True,
lowess=True,
line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})
plt.ylabel("Standarized residuals")
plt.xlabel("Fitted value")
plt.title('Scale-Location plot')
Text(0.5, 1.0, 'Scale-Location plot')
3.1 Determinação do tamanho da amostra de teste (0,5) 3.2 Medidas de desempenho para amostra de treino e para amostra de teste: 𝑅2, 𝑅𝑀𝑆𝐸, 𝑀𝐴𝐸 (2,0) 3.3 Descrição do modelo final, discussão sobre a significância obtida para os coeficientes, análise de resíduos (1,5)